IFT6758 Blog A template based on the Lanyon Jekyll theme

Milestone 2

Question 2

1


2.1 Shot Counts Histogram, binned by distance:
screenshot
During the 2015-2018 seasons, shot and goal count together is highest when taken between 5 to 60 feet. Number of shots taken further than 60 feet away decrease significantly. Most shots were taken between 10-15 feet, being 30124. Most goals were scored between 10-15 feet, being 5997. Shots shot in all other distances count up to less than 2000 per bin. Goals scored in all other distances count up to less than 400 per bin. Between 0-5 feet there is very low shot count. This could possibly be because when the shooter is too close to the net, the shooter is too close to the goalie so is hard to angle a shot into the net.


2.1 Shot Counts Histogram, binned by angle:
screenshot
During the 2015-2018 seasons, shot and goal count together based on angle taken appears to look like a bimodal distribution. The most shots were taken between 60-65 degrees, and 115-120 degrees, being 15506 and 15958, respectively. There is no clear angle bin that had the most goals, as number of goals mostly ranged between 1300-1800 between angles of 60-120 degrees. Shots taken at less than a 45 degree angle and greater than a 140 degree angle start to rapidly degrees in count. Goal count as well.


2.1 2D Histogram:
screenshot
This 2D histogram has 1 square bin per distance bin and angle bin. The darker shade in blue for each square bin, the higher in count. We can see that the darkest patches are when both the distance histogram and angle histogram are at their peaks. For example, when distance is at around 10-15 feet and angle is at either 60-65 degrees or 115-120 degrees. At distances and angles that are low in shot/goal count, the 2D histogram is white because there are no instances of shots taken at far distances and really far off angles.

2


2.2 Goal Rate to Distance:
screenshot
Goal rate is highest between 0-5 feet (~0.31), and starts to drop drastically until 70 feet (~0.03). 70 feet is also around when shot count drops significantly. This makes sense as the closer the shot is taken, the less time for the goalie to react to the shot, so the higher chance of scoring a goal. In addition, although goal percentage for shots taken beyond 70 feet fluctuate and are sometimes higher than when shots were taken closer than 70 feet, the sample sizes are too small so the goal percentage for shots taken beyond 70 feet are probably not that accurate and should be disregarded.

2.2 Goal Rate to Angle:
screenshot
Goal rate is highest around 95-100 degrees (~0.13). According to shot/goal count with respect to shot angle, shots taken between 40-140 degrees are highest in count, which means it has the largest sample size. Goal rate in this angle range ranges between ~0.07 to ~0.13, which is what we should focus on. In the 40-140 degree range, 80-100 degrees has the highest goal rate ranging between ~0.12 to ~0.13. This makes sense as shots taken more directly at the goal post have more area for the goalie to cover, so is harder for the goalie to save the shots, which means more likely to score the goal.

3


2.3 Goals only, binned by distance, separated by empty net and non-empty net events:
screenshot
The most goals were scored between 10-15 feet, being 5997 (98 empty net, 5899 non-empty net). Goal count drops down rapidly as goal distance increases for both empty and non-empty net. This makes sense as there should be a higher chance to score the closer the shooter is to the goal post. However, it is strange to have a small peak of goal counts around 170-175 feet away. That is almost shot from the distance of the net from the other side. We investigated this and found that the x and y coordinates were actually logged incorrectly - that is, logged on the wrong side. This can be proven by game_id 2015020671, evenIdx 404. On January 17, 2016, in the Canucks vs Islanders game during penalty shootouts, Radim Vrbata (Canucks) scored a goal on Jaroslav Halak (Islanders) right in front of the net, but the coordinates were logged to be on the opposite side. Clearly, this was a mistake in the logging. See https://www.youtube.com/watch?v=xQjKUsl1a9I at time 5:11 for reference.

Question 3

1


We got a validation accuracy of 90.4%. This is quite high. We investigated the model’s predictions on the validation set and realized that it is predicting every shot to not be a goal. A potential explanation for this is class imbalance. In all of the training data, there are 311106 y values. There are 29187 goals and 281919 shots, being 0.094 and 0.906 of the total y values, respectively. This could make the model biased and achieve a high accuracy of ~90 just by predicting every shot to not be a goal. Other potential explanations are that the features either aren’t very correlated with the y-values, the model wasn’t able to learn anything from the features, or both.

2

(no question)

3



The ROC (Receiver Operator Characteristic) curve shows the discriminative ability of binary classifiers. The TPR (true positive rate) is plotted against the false positive rate (FPR). So, binary classifiers that have a ROC curve that is closer to the top left corner are better performing. Correspondingly, the higher the AUC (area under the curve), the better performance of the classifier. The Logistic Regression model trained on [distance from net] and [distance and angle from net] tied and achieved the best ROC curve with AUC score of 0.67, as can also be seen from the red (covered) and pink curves as if being “pulled” close towards the (0, 1.0) corner. The logistic regression model trained on [angle from net] achieved a AUC score of 0.5, which is the same as the random baseline, so is clearly inferior to the previous 2 models.
We would like for the case to generally be that the higher the probability model percentile, the higher goal rate. This is shown in the cases of the model trained on [distance from net] and [distance and angle from net] as it shows that the model’s higher predictions of a shot to be a goal is positively correlated with actual goal rates. The model trained on [angle from net] does not show any correlation between shot probability model percentile and goal rate, so it indicates that the model was not able to learn to predict with high accuracy from just the [angle from net] feature.
It makes sense too see the cumulative proportional of goals curve increase as shot probability model percentile increases. All curves should eventually increase to 1.0, which is shown. However, the curve for [distance from net] and [distance and angle from net] increase at a non-uniform pace. As shot probability model percentile increases, the rate at which the cumulative proportion of goals increases also increases. This makes sense as the higher shot probability, the more goals there should be. Thus, the cumulative proportion of goals will increase faster.
Perfectly calibrated probabilities of a model is the y=x curve. Therefore, curves that more similar to the y=x curve are better calibrated. Logistic Regression models trained on [distance from net] and [distance and angle from net] achieved very similar calibration curves near y=x. Their predicted probabilities are better calibrated than the model trained on [angle from net] which turned out to be only got 1 point on the plot.

4


Logistic Regression, trained on distance only: comet link

Logistic Regression, trained on angle only: comet link

Logistic Regression, trained on both distance and angle: comet link

Question 4

5


comet link

#NOTE the csv is saved in /assets/milestone2


The list of all of the features:

  • eventIdx: unique event identifier per game
  • game_id: unique game identifier
  • Game Seconds: number of seconds that passed in the game
  • X-Coordinate: the x-coordinate of where the event occurred
  • Y-Coordinate: the y-coordinate of where the event occurred
  • Shot Distance: the distance from where the shot was taken from
  • Shot Angle: the angle from where the shot was taken from
  • Shot Type: the type of shot taken
  • Was Net Empty: whether or not the net was empty when a goal was scored
  • Last Event Type: what the last event was
  • Last X-Coordinate: the x-coordinate of where the last event occurred
  • Last Y-Coordinate: the y-coordinate of where the last event occurred
  • Time from Last Event (seconds): the number of seconds that passed since the last event
  • Distance from Last Event: the distance from the location of the last event
  • Is Rebound: whether or not the current event is a rebound (by considering whether or not the last - event was also a shot)
  • Change in Shot Angle: the difference in shot angle from the last event (only if this shot is a rebound)
  • Speed: the distance from the distance event divided by the time since the previous event
  • Is Goal: whether or not this event resulted in a goal

    Question 5

    1


    screenshot

    screenshot

    screenshot

    screenshot

    distance-only: comet link

    angle-only: comet link

    distance and angle: comet link



We used sklearn.model_selection.train_test_split to split the data into 80% train and 20% validation. For the XGBoost classifier trained on [distance from net], [angle from net], [distance and angle from net], we got validation accuracies of 90.39%, 90.40%, and 90.39%, respectively. This is the same as the validation accuracies for the Logistic Regression classifier in question 3 trained on [distance from net], [angle from net], [distance and angle from net].
Comparing ROC curves, for the features [distance from net], [angle from net], [distance and angle from net], Logistic Regression got AUCs of 0.67, 0.50, 0.67, respectively, and XGBoost got AUCs of 0.68, 0.62, 0.70, respectively. So, we can see that XGBoost turned out to be a better performing classifier than Logistic Regression for these cases.
The goal rate for all 3 XGBoost curves trend upwards as shot probability model percentile increases. This is an improvement over the Logistic Regression model as Logistic Regression’s goal rate curve did not improve when trained using [angle from net]. So, this suggests that XGBoost was able to learn something from the data from all 3 different subsets of features, since all 3 curves increased rather than stayed at the same goal rate.
The cumulative proportion of goals for all 3 XGBoost curves started off increasing at a slow pace, and then increased as shot probability model percentile increased. This is consistent with the information from the goal rate curves as a good performing model should should predict higher shot probability when the actual goal rates are also higher. Since all 3 XGBoost curves are increasing at an exponential rate, it is an improvement over the Logistic Regression curves, where the rate of the model trained using [angle from net] increased at a constant pace.
The calibration curves for XGBoost trained on [distance from net], [angle from net], [distance and angle from net] were all very close to the perfectly calibrated line where y=x until around when mean predicted probability was 0.2. This means the model was calibrated well up to 0.2 mean predicted probability. All 3 XGBoost curves also appear to be better calibrated than the Logistic Regression’s 3 curves.

2


screenshot

screenshot

screenshot

screenshot

We used grid search to tune the hyperparameters of the XGBoost model when trained using all the features of the data. We searched over the parameter space for the number of estimators, max depth, learning rate, and the type of booster. The best validation accuracy we got was 91.27%, which was when booster is gbtree, learning rate is 0.05, max depth is 10, and number of estimators being 100. The ROC curve shows more improvement over the XGBoost model trained using only [distance from net], [angle from net], or [distance and angle from net]. XGBoost trained using all features achieved an AUC of 0.77, which beat the previous highest XGBoost’s AUC of 0.7. The goal rate curve also looks even steeper as shot probability model percentile increases, especially as we approach 0.8. The same result is shown in the cumulative proportion of goals curve. The curve becomes even steeper as we approach 0.8 shot probability model percentile. For the calibration curve, this XGBoost curve seems to be well calibrated until around 0.4 mean predicted probability, which is an improvement over the previous XGBoost models, which was only well calibrated until around 0.2 mean predicted probability. So, judging from these results, this XGBoost that was trained using all features as well as having its hyper parameters tuned is an overall improvement over the XGBoost baseline model.


comet link

3


Technique 1: Removing features with low variance

  • By setting a variance threshold, we can calculate the variance of all the features and remove the ones who do not meet the threshold.
  • comet link


Technique 2: Univariate Selection

  • Select the features that have the strongest relationship with the output variable based on some statistical test. We used the chi-squared test to compute the chi-squared stat between each feature and class. We then select the k features that had the highest scores.
  • comet link


Technique 3: Recursive Feature Elimination

  • Features are recursively removed until the specified number of features that want to be kept. Features are recursively removed based on their importance scores. Importance scores are calculated based on how much each feature contributes to predicting the target class.
  • comet link


Technique 4: Tree-based Feature Selection

  • Use tree-based estimators (in our case, the Extra-tree Classifier, which is an extremely randomized tree classifier) to compute impurity-based feature importances. Then, sort the features based on their importance level and then only keep the most important ones.
  • comet link


The optimal set of features was using Univariate Selection. The original 18 features got reduced down to 14, specifically [‘eventIdx’, ‘game_id’, ‘Game Seconds’, ‘Game Period’, ‘Y-Coordinate’, ‘Shot Distance’, ‘Shot Type’, ‘Was Net Empty’, ‘Last Event Type’, ‘Last Y-Coordinate’, ‘Time from Last Event (seconds)’, ’Distance from Last Event’, ‘Is Rebound’, ‘Speed’]. When tested on a Logistic Regression model, these features improved the validation accuracy from 90.39% to 91.23%.

Question 6

1 Figurs and Discussions

Approach 1: Different model type: Decision Tree Classifier


A supervised machine learning classification model where data is split at each level depending on some parameter until it is categorized as some class. Using the scikit-learn DecisionTreeClassifier with default parameters, we got a validation accuracy of 84.39% and auc of 0.57.
ROC
screenshot

Goal
screenshot

Cumulative
screenshot

Calibration
screenshot

Approach 2: Hyperparameter Tuning: Decision Tree Classifier with Randomized Search on Hyperparameters and Regularization


We tuned the hyperparameters of the decision tree model using randomized search. We searched through the splitter, max_depth, min_samples_split, min_samples_leaf, max_features, max_leaf_nodes parameters and got a validation accuracy of 90.82%. The optimal parameter settings were: ‘splitter’: ‘best’, ‘min_samples_split’: 0.9, ‘min_samples_leaf’: 0.1, ‘max_leaf_nodes’: 2, ‘max_features’: 4, ‘max_depth’: 48.
ROC
screenshot

Goal
screenshot

Cumulative
screenshot

Calibration
screenshot

Approach 3: More advanced feature selection strategy: Decision Tree Classifier with PCA feature reduction


We used PCA to reduce the features to 3. We got a validation accuracy of 82.86% using the Decision Tree Model after.
ROC
screenshot

Goal
screenshot

Cumulative
screenshot

Calibration
screenshot

Approach 4: Approach 4: Different model type: Multilayer Perceptron Classifier


The Multilayer Perceptron classifier is a type of feedforward artificial neural network. We got a validation accuracy of 90.82% and AUC of 0.5.
ROC
screenshot

Goal
screenshot

Cumulative
screenshot

Calibration
screenshot

Summary
Both approaches 2 and 4 achieved validation accuracy of 90.82% and AUC of 0.5. These are our best ‘final’ models. However, I would pick the MLP to be our best model over the Decision Tree Classifier with Randomized Search on Hyperparameters and Regularization because does not require additional hyperparameter tuning to achieve the same result.

Approach 1

comet link

Approach 2

comet link

Approach 3

comet link

Approach 4

Question 7

1 - Regular Seasons


For the Logistic regression models trained on [distance from net], [angle from net], and [distance and angle from net], testing on the untouched 2019/20 regular season dataset yielded 0.01 better AUC of 0.68, 0.51, and 0.68, respectively, compared to the training set’s validation AUC of 0.67, 0.50, and 0.67, respectively. The best XGBoost model saved in part 5 achieved a AUC value of 0.7 for this test set. The AUC for the test set here (0.54) is a lot worse than the AUC from the validation set tested earlier. The best overall model from part 6 (MLP) achieved an AUC value of 0.50 for this test set. It is the same result compared with the AUC from the validation set tested earlier (0.50), which is normal to see, but a terrible result.
The goal rate for the three logistic regression curves on the test set is more or less the same result as when it was tested on the validation test. The best XGBoost model does not seem to have learned anything when we look at its goal rate curve as it does not have a clear upward trend. This is the similar to the best XGBoost curve tested on the validation set. The MLP goal rate curve was almost always at 0. This indicates that the model did not learn anything about the data to make good predictions on class probabilities.
The logistic regression models tested separately on [distance from net], and [distance and angle from net] appear to have the best cumulative goal rate curves because they are most curved, which indicates they learned something from the features of the data. Logistic regression testing on [angle of net] and the best XGBoost model both have very similar curves to the y=x curve, which indicates that they did not make good predictions on probabilities of classes. MLP was even worse, as the cumulative proportion of goals curve on the test set was almost always at 0.0.
ROC
screenshot


Goal
screenshot


Cumulative
screenshot


Calibration
screenshot

2 - Playoffs


For the Logistic regression models trained on [distance from net], [angle from net], and [distance and angle from net], testing on the untouched 2019/20 regular season dataset yielded 0.01 better AUC of 0.68, 0.51, and 0.68, respectively, compared to the training set’s validation AUC of 0.67, 0.50, and 0.67, respectively. The best XGBoost model saved in part 5 achieved a AUC value of 0.7 for this test set. The AUC for the test set here (0.54) is a lot worse than the AUC from the validation set tested earlier. The best overall model from part 6 (MLP) achieved an AUC value of 0.50 for this test set. It is the same result compared with the AUC from the validation set tested earlier (0.50), which is normal to see, but a terrible result.
The goal rate for the three logistic regression curves on the test set is more or less the same result as when it was tested on the validation test. The best XGBoost model does not seem to have learned anything when we look at its goal rate curve as it does not have a clear upward trend. This is the similar to the best XGBoost curve tested on the validation set. The MLP goal rate curve was almost always at 0. This indicates that the model did not learn anything about the data to make good predictions on class probabilities.
The logistic regression models tested separately on [distance from net], and [distance and angle from net] appear to have the best cumulative goal rate curves because they are most curved, which indicates they learned something from the features of the data. Logistic regression testing on [angle of net] and the best XGBoost model both have very similar curves to the y=x curve, which indicates that they did not make good predictions on probabilities of classes. MLP was even worse, as the cumulative proportion of goals curve on the test set was almost always at 0.0.
ROC
screenshot


Goal
screenshot


Cumulative
screenshot


Calibration
screenshot

Milestone 1

Question 1

1.1


Screenshot:
screenshot

Explanation:

Issues:
Highest SV% players have very low shots against (SA), so the sample size is very small. Larger sample sizes (larger SA) could provide more accurate SV%, and provide a smaller margin of error.

Fix issue:
We can only consider players who have at least the average number of SA, and then sort by SV%.

1.2


Screenshot:
screenshot

1.3

In determining a goalie’s performance, other features that could potentially be useful may be Shots Against (SA), Goals-Against Average (GAA), Goalie Point Shares (GPS), and Games Played (GP) and Won (W).

Having a high SA would mean a larger sample size for SV%. Larger sample sizes (larger SA) could provide more accurate SV%, and provide a smaller margin of error.

GAA calculates the number of goals allowed per 60 minutes played. So, the lower GAA, the better performance of the goalie.

GPS is an estimate of the number of points contributed by a player due to his play in goal. So beyond only metrics about the goals a goalie saved, this metric shows how many points they were able to help score. The higher GPA, the better performance of the goalie.

At the end of the day, what matters after a game is winning. A goalie does a lot more to impact a game than what appears only in his post-game stats such as stopping “dump-ins”. Therefore, a higher win percentage by calculating W/GP could also be potentially useful in determining a goalie’s performance..

Question 2

First, we figured out the Game ID naming rules based on the API. Then, we tested the IDs and printed out their corresponding JSON data on the terminal just to see.

screenshot

Next, in our Python code, we loop through years, and regular season and playoffs to get every single game ID. We get each game’s JSON data from the statsapi webpage for each game ID and save the JSON data locally. The downloaded JSON files are saved in this file structure:

JSON_data
│
└───regular_seasons
│   └───2016
│       │   2016020001.json
│       │   ...
│   
└───playoffs
│   └───2016
│       │   2016030111.json
│       |   ...

Question 3


Screenshot:
screenshot

Explanation:
In this debugging tool, the user can select the season from all year options, and select game type between regular_season and playoffs. the game_number slider allows the user to select which particular game of the season, and the eventIdx slider selects the event number.
The diagram will then display the coordinates (large blue dot) for where the event happened.

We used matplotlib to draw the coordinates and to add the background image, and used ipwidgets to add the interactive functionality.

Question 4

4.1

screenshot

4.2

We know that the strength of players on the ice starts off being even. So starting off, every event will be at even strength (5 on 5) until a penalty occurs. When a penalty occurs, the strength will change. The team whose player received a penalty will go to the penalty box leaving his team short handed and giving the other team a power play. This changes the strength of the players to 5 on 4. A penalty will last a certain number of minutes depending on the type of penalty. So, the strength of the players will change back to even automatically once the penalty expires, and we can check the game time of every event to know when to change the player strength back. Another possibility is if the team that has a power play scores a goal. Then, the strength will also change back to even.

4.3

In hockey, a rebound occurs when a puck gets shot and bounces off the frame of the goal post. We can speculate that shots or goals could have came from a rebound by paying attention to the time this event occurred and the previous event that occurred. If the previous event was a shot from the same team and happened within maybe 2 seconds, it is likely that the current shot came off of a rebound.

According to the youtube video linked in the instructions, a play off the rush appears to be when a single player is rushing towards the end he is trying to score with defenders trying to chase him to stop him. I don’t know anything about hockey but I assume this happens when everyone is on one side of the rink and then possession changes all of a sudden (hence why there are no defenders on their goal side and they are chasing the player rushing). Possessions change when there is a giveaway or takeaway event. So, we can speculate that a players shot or goal came off the rush if the previous event was a giveaway or a takeaway and occurred within a few seconds.

Question 5

5.1

screenshot

Most dangerous type of shot for the offensive team: Wrap-around. It has the lowest shot (521) and goal count (38), as well as the lowest goal percentage (0.06797853). Therefore, it is dangerous for the offensive team to play this type of shot as it seems to be the least successful.

Most dangerous type of shot for the defensive team: Tip-in. It may not have as many shot and goal counts as the other types of shots, but it has the highest goal percentage (0.18019306). Therefore, it is dangerous for the defensive team if the offensive team plays this type of shot as it seems to be the most successful for the offensive team.

Most common type of shot: Wrist Shot. There were over 30000 total shots and goals from all shooters and scores in the 2020-2021 season who used a wrist shot.

5.2

2018

screenshot
screenshot

2019

screenshot
screenshot

2020

screenshot
screenshot

Between the shot distances of 0 feet to 75 feet, from our “Goal Percentage based on Shot Distance in the X-Y Season” graphs, we can see that as shot distance increases, the goal percentage decreases. So, there is generally a negative correlation between shot distance and goal percentage between 0-75 feet. There is also very low shot and goal counts according to our “Shot or Goal Counts based on Shot Distance in the X-Y Season” histograms for shot distances beyond 75 feet, so there is very little sample size. The goal percentage for shots taken beyond 75 feet change very drastically for every shot distance bin. So, goal percentages beyond 75 feet can be mostly ignored. Between the previous 3 seasons, there seems to be very little change in the overall shape of the “Shot or Goal Counts based on Distance” and “Goal Percentage based on Shot Distance” graphs.

5.3

screenshot



If we associate this figure with our previous “Shot or Goal Counts based on Shot Distance” histograms in question 5.2, we know that shots taken from distances beyond 75 feet should mostly be disregarded due to very low sample size. So, looking just at shot types taken between 0-75 feet, we can sort of see a negative correlation relationship between shot distance and goal percentage among all types of shots. Although, it is not always as clear of a relationship because some curves have jumps. Slap shot (yellow) has the highest goal percentage between 0-30 feet, and has a relatively clear negative correlation. Wrap-around has the lowest goal percentages throughout the 0-75 feet range, which would make it the most dangerous type of shot because it yields the least goals.

Between 0-30 feet, the most dangerous type of shot for the defensive team is the slap shot (since it’s a fast one (/๑•́o•̀๑)/, and the goalie has less time to react). Beyond 30 feet, tip-in and deflected appear to have the highest goal percentages (but they alternate having the highest goal percentages with increasing shot distance), so are the most dangerous shots for the defensive team. The most dangerous type of shot overall for the offensive team is the wrap around. It has below 0.1 goal percentage up to 20 feet and near 0 goal percentage beyond 30 feet.

Question 6

6.1

To TA: we built this interactive graph using Dash, and then deployed it on Heroku. It runs kind of slowly, so please be patient ༼ ༎ຶ ᆺ ༎ຶ༽.

Check the deployed graph

Once you select ‘year’ and ‘team’ from the dropdown, please wait for up to 20 seconds, then you’ll see the fancy graph! ;)

6.2

Firstly, we translated all shots to be shot towards the left post. So, for all shots that were supposed to be shot towards the right side, we rotated those coordinates by 180 degrees. This way, we don’t see a mix of shots taken from the entire rink.

We calculated league average to be the total number of shots and goals among all teams divided by the number of teams per square feet on the rink. We then tallied the number of shots at every location for a given team. The plot then shows the difference between the number of shots and goals a specific team took compared with the league average. So from these plots, we can see how many more shots or how many less shots a team shot from specific locations on the hockey rink compared with the league average for a specific season. We can interpret these plots like a heat map. The darker the shade of pink at a specific location on the ice rink, the more shots the team took compared with the league average. The lighter the shade of pink at a specific location on the ice rink the less shots the team took compared with the league average.

6.3

Season 2016~2017, Team: Colorado Avalanche


screenshot

Season 2020~2021, Team: Colorado Avalanche


screenshot
Looking at the Colorado Avalanche (AVA) shot map for the 2016-2017 season, we can see that there are more yellow areas than purple areas near the goal post. So, we can say that they took less shots or got less goals near the goal post than the league average. Otherwise, for the rest of the plot, there are small areas where AVA took more shots than league average (in purple), and less shots than league average (in yellow (you would have to look closely)). It is hard to pinpoint anything unique about any of the rest of the area.

Looking at the AVA shot map for the 2020-2021 season, we can see that there is less yellow near the goal post than the league average and also less yellow than their shot map in 2016-2017. This means they took more shots near the goal post in the 2020-2021 season, so their strategy changed.

Looking at the standings for the 2016-2017 season, AVA was 17/31. In 2020-2021, AVA was 1/31 . This makes sense because from question 5.2, we concluded that shots shot from the least distance yielded the highest goal percentage. Shots near the goal post are shots shot from the least distance. Since AVA shot more shots near the goal post in the 2020-2021 season, they should have scored more goals, which makes sense since they became 1st overall in standings in 2020-2021.

6.4

Season 2018~2019

Team: Buffalo Sabres


screenshot

Team: Tampa Bay Lightning


screenshot

Season 2019~2020

Team: Buffalo Sabres


screenshot

Team: Tampa Bay Lightning


screenshot

Season 2020~2021

Team: Buffalo Sabres


screenshot

Team: Tampa Bay Lightning


screenshot


The shot maps of the Buffalo Sabres (BUF) for the seasons 2018-19, 2019-20, 2020-21 are all very spread out. We can even see shots shot above league average taken from even the far corners (x-coord: <20, y-coord: <10), (>60, <10), (<20, >70), (>60, >70) for the 2018-2019 season.

Comparatively, the Tampa Bay Lightning (TBL) have far less shots taken from those corners. It seems very difficult to score shooting from near the goal line, yet BUF still shoots from there. TBL takes more shots around the centre region across the 3 seasons. We can see that from the TBL centre regions having more purple compared to BUF. This could explain their success in scoring more goals, winning more games, and winning the Stanley Cup as shooting from this region is more direct than locations further away or near the goal line. This does not picture the complete picture that explains the BUF’s struggles or the TBL’s success since shot maps only tell us locations from where shots were taken from.

Other factors that affect a team’s success include their play style, passing, player movement and switches on the ice and goal percentage at each shot location.

IFT6758 Demo Post

This post outlines a few more things you may need to know for creating and configuring your blog posts. If you are interested in more general template features or syntax, you can visit the Introducing Lanyon or the Example Content posts.

Configurations

You should modify some of the default values in _config.yml, found in the root directory of this repo. Things like the title, tagline, description, author information, etc. are all fair game to modify. Be more careful when modifying the url information - things can break if done incorrectly (these are used if you are deploying via Github pages)

Creating Posts

To create a new post in the blog, add a new Markdown file to the _posts/ directory, with the name following the format YYYY-MM-DD-postname.md. Begin the post with the following code:

---
layout: post
title: [POST TITLE]
---

From there, write your content as you would a normal Markdown file. In general, I would recommend writing one sentence per line. This is not required, but this is far easier to work with than having a single giant line of multiple sentences for a single paragraph.

Interactive plots

Here’s how you could embed interactive figures that have been exported as HTML files. Note that we will be using plotly for this demo, but anything that allows you to HTML should work. All that’s required is for you to export your figure into HTML format, and make sure that the file exists in the _includes directory in this repository’s root directory. To embed it into any page, simply insert the following code anywhere into your page.

{% include [FIGURE_NAME].html %} 

For example, the following code can be used to generate the figure underneath it.

import pandas as pd
import plotly.express as px

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')

fig = px.density_mapbox(df, lat='Latitude', lon='Longitude', z='Magnitude', radius=10,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="stamen-terrain")
fig.show()

fig.write_html('./_includes/plotly_demo_1.html')

The above figure is pretty cool, but you can also embed heavier/more complex figures. For brevity, the following figure is generated from the included plotly_html.ipynb notebook file in the repo’s root directory.

Introducing Lanyon

Lanyon is an unassuming Jekyll theme that places content first by tucking away navigation in a hidden drawer. It’s based on Poole, the Jekyll butler.

Built on Poole

Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by @mdo. Poole, and every theme built on it (like Lanyon here) includes the following:

  • Complete Jekyll setup included (layouts, config, 404, RSS feed, posts, and example page)
  • Mobile friendly design and development
  • Easily scalable text and component sizing with rem units in the CSS
  • Support for a wide gamut of HTML elements
  • Related posts (time-based, because Jekyll) below each post
  • Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)

Lanyon features

In addition to the features of Poole, Lanyon adds the following:

  • Toggleable sliding sidebar (built with only CSS) via link in top corner
  • Sidebar includes support for textual modules and a dynamically generated navigation with active link support
  • Two orientations for content and sidebar, default (left sidebar) and reverse (right sidebar), available via <body> classes
  • Eight optional color schemes, available via <body> classes

Head to the readme to learn more.

Browser support

Lanyon is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.

Download

Lanyon is developed on and hosted with GitHub. Head to the GitHub repository for downloads, bug reports, and features requests.

Thanks!

Example content

Howdy! This is an example blog post that shows several types of HTML content supported in this theme.

Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.

Curabitur blandit tempus porttitor. Nullam quis risus eget urna mollis ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.

Etiam porta sem malesuada magna mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.

Inline HTML elements

HTML defines a long list of available inline tags, a complete list of which can be found on the Mozilla Developer Network.

  • To bold text, use <strong>.
  • To italicize text, use <em>.
  • Abbreviations, like HTML should use <abbr>, with an optional title attribute for the full phrase.
  • Citations, like — Mark otto, should use <cite>.
  • Deleted text should use <del> and inserted text should use <ins>.
  • Superscript text uses <sup> and subscript text uses <sub>.

Most of these elements are styled by browsers with few modifications on our part.

Heading

Vivamus sagittis lacus vel augue rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.

Code

Cum sociis natoque penatibus et magnis dis code element montes, nascetur ridiculus mus.

// Example can be run directly in your JavaScript console


// Create a function that takes two arguments and returns the sum of those arguments

var adder = new Function("a", "b", "return a + b");

// Call the function

adder(2, 6);
// > 8

Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.

Lists

Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.

  • Praesent commodo cursus magna, vel scelerisque nisl consectetur et.
  • Donec id elit non mi porta gravida at eget metus.
  • Nulla vitae elit libero, a pharetra augue.

Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.

  1. Vestibulum id ligula porta felis euismod semper.
  2. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
  3. Maecenas sed diam eget risus varius blandit sit amet non magna.

Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.

HyperText Markup Language (HTML)
The language used to describe and define the content of a Web page
Cascading Style Sheets (CSS)
Used to describe the appearance of Web content
JavaScript (JS)
The programming language used to build advanced Web sites and applications

Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Nullam quis risus eget urna mollis ornare vel eu leo.

Tables

Aenean lacinia bibendum nulla sed consectetur. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Name Upvotes Downvotes
Totals 21 23
Alice 10 11
Bob 4 3
Charlie 7 9

Nullam id dolor id nibh ultricies vehicula ut id elit. Sed posuere consectetur est at lobortis. Nullam quis risus eget urna mollis ornare vel eu leo.


Want to see something else added? Open an issue.